- Jobs
- Site Reliability Engineer (with Java)
Site Reliability Engineer (with Java)
Site Reliability Engineer (with Java)
Technologies: Azure, Kubernetes, Datadog, Python, PostgreSQL, Cloudflare, Ubuntu, Jenkins, GitHub,GitHub Actions, Azure DevOps, and DataDome
Job Description:
- You have proven experience demonstrating hands-on technical excellence and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
- You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture
- You have the skills to implement load, stress, performance, and reliability testing standards at scale to improve service, platform, and infrastructure resiliency
- You promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
- You demonstrate clear decision-making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
- You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
- You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
- You enable the engineering organization to innovate and deliver with greater speed and safety
- Experience with Jenkins, Terraform, SALT, Kubernetes (AKS), Azure, VMWare, Splunk, Docker, Python, Bitbucket
- Experience with monitoring systems, tracing and observability to manage large scale systems and 24x7 availability.
- Working with the Applications, Engineering, Platform, Operations and Infrastructure and Cloud teams to ensure we are a premier software delivery organization.
- Drive software engineering standard processes into all of our operational spaces by leading from the front and by example.
- Passionate about leading a 24/7 engineering organization and enabling automation
- Lifelong learner who furthers diversity of thought in their approach to management
- Willingness to learn eCommerce and/or fulfillment operations and PDL business processes and interdependencies
- Driven to deliver results on time with high-quality
- Demonstrated ability to deliver solutions that are easily maintainable, understandable, and diagnosable
- Able to write clear and consumable documentation
- Track record of building and running high-performance teams
- Expertise analyzing complex application, database, network, and OS issues across a distributed large-scale customer-facing system
- System configuration management experience with automation tools such as Puppet, Chef, Ansible, or Salt
- Advanced experience with programming and/or scripting languages (Python, Java, bash)
- Experience with DevOps tools, processes, and culture
- High-level technical understanding (development methodologies, phases, etc)
- Proven track record in working collaboratively across functional areas to get results
- Interact effectively with representatives from Technology and Business Partners